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Modern society's increasing dependency on online tools for both work and recreation opens up 
unique opportunities for the study of social interactions. A large survey of online exchanges or 
conversations on Twitter, collected across six months involving 1.7 million individuals is presented 
here. We test the theoretical cognitive limit on the number of stable social relationships known as 
Dunbar's number. We find that users can entertain a maximum of 100 — 200 stable relationships 
in support for Dunbar's prediction. The "economy of attention" is limited in the online world by 
cognitive and biological constraints as predicted by Dunbar's theory. Inspired by this empirical 
evidence we propose a simple dynamical mechanism, based on finite priority queuing and time 
resources, that reproduces the observed social behavior. 



I. INTRODUCTION 

Modern society's increasing dependence on online 
tools for both work and recreation has generated an 
unprecedented amount of data regarding social behavior. 
While this dependence has made it possible to redefine 
the way we study social behavior, new online commu- 
nication tools and media are also constantly redefining 
social acts and relations. Recently, the divide between 
the physical world and online social realities has been 
blurred by the new possibilities afforded by real-time 
communication and broadcasting, which appear to 
greatly enhance our social and cognitive capabilities in 
establishing and maintaining social relations. The com- 
bination of mobile devices with new tools like Twitter, 
Foursquare, Blippy, Tumblr, Yahoo! Meme, Google 
Hot spot, etc., are defining a new era in which we can be 
continuously connected with an ever-increasing number 
of individuals through constant digital communication 
composed of small messages and bits of information. 
Thus, while new data and computational approaches to 
social science [THS] finally enable us to answer a large 
number of long-standing questions [4-6 , we are also 
increasingly confronted with new questions related to 
the way social interaction and communication change 
in online social environments: What is the impact that 
modern technology has on social interaction? How do 
we manage the ever-increasing amount of information 
that demands our attention? In 1992, R. I. M. Dunbar 
[7] measured the correlation between neocortical volume 
and typical social group size in a wide range of primates 
and human communities. The result was as surprising 
as it was far-reaching. The limit imposed by neocortical 
processing capacity appears to define the number of 
individuals with whom it is possible to maintain stable 
interpersonal relationships. Therefore, the size of the 
brain's neocortex represents a biological constraint on 
social interaction that limits humans' social network size 
to between 100 and 200 individuals [8 , i.e. Dunbar's 
number. McCarty et al. [9l independently attempted to 



measure typical group size using two different methods 
and obtained a number of 291, roughly twice Dunbar's 
estimate. 

Biological constraints on social interaction go along with 
other real-world physical limitations. After all, a persons 
time is finite and each person must make her own choices 
about how best to use it given the priority of personal 
preferences, interests, needs, etc. The idea that attention 
and time are scarce resources led H. Simon [10 to apply 
standard economic tools to study these constraints 
and introduce the concept of an Attention Economy 
with mechanisms similar to our everyday monetary 
economy. The increasingly fast pace of modern life and 
overwhelming availability of information has brought 
a renewed interest in the study of the economy of 
attention with important applications both in business 
[TT and the study collective human behavior [12 . On 
the one hand it can be argued that microblogging tools 
facilitate the way we handle social interactions and 
that this results in an online world where human social 
limits are finally lifted, making predictions such as the 
Dunbar's number obsolete. Microblogging and online 
tools on the other hand, might be analogous to a pocket 
calculator that, while speeding up the way we can do 
simple math, does not improve our cognitive capabilities 
for mathematics. In this case, the basic cognitive 
limits to social interactions are not surpassed in the 
digital world. In this paper we show that the latter 
hypothesis is supported by the analysis of real world 
data that identify the presence of Dunbars limit in Twit- 
ter, one of the most successful online microblogging tools. 



II. THE DATASET 

Having been granted temporary access to Twitters fire- 
hose we mined the stream for over 6 months to identify a 
large sample of active user accounts. Using the API, we 
then queried for the complete history of 3 million users. 
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FIG. 1: Reply trees and user network. A) The set of all trees is a forest. Each time a user replies, the corresponding tweet 
is connected to another one, resulting in a tree structure. B) Combining all the trees in the forest and projecting them onto 
the users results in a directed and weighted network that can be used as a proxy for relationships between users. The number 
of outgoing (incoming) connections of a given user is called the out (in) degree and is represented by k^^* (k^^). The number 
of messages flowing along each edge is called the degree, u. The probability density function P(/c^''*) {P{k''')) indicates the 
probability that any given node has k^^* (k^^) out (in) degree and it is called the out (in) degree distribution and is a measure 
of node diversity on the network. 



resulting in a total of over 380 million individual tweets 
covering almost 4 years of user activity on Twitter. Ta- 
ble |T] provides some basic statistics about our dataset. 
Here we analyze this massive dataset of Twitter conver- 
sations accrued over the span of six months and investi- 
gate the possibility of deviation from Dunbar's number 
in the number of stable social relations mediated by this 
tool. The pervasive nature of Twitter, along with its 
widespread adoption by all layers of society, makes it an 
ideal proxy for the study of social interactions p!3UT6] . 
We have analyzed over 380 million tweets from which 
we were able to extract 25 million conversations. Each 
Twitter conversation takes on the form of a tree of tweets, 
where each tweet comes as a reply to another. By pro- 
jecting this forest of trees onto the users that author each 
tweet, we are able to generate a weighted social network 
connecting over 1.7 million individuals (see Figure fl]). 



Tweets 


381,652,990 


Timelined Users 


3,006,180 


Scraping Period Nov. 20, 2008 


- May 29, 2009 


Time span 


4 years 




Trees 


25,273,871 


Tweets in Trees 


81,728,252 


User in Trees 


1,720,320 


User-User Edges 


68,459,592 



TABLE I: Dataset Statistics. 



A. Tree Identification and Projection 

All tweets in our dataset that constituted a reply were 
collected. Each such tweet contains information not only 



about the id of the original tweet but also the user that 
sent it. Using this information, each reply tweet maps 
directly to a directed edge. Individual trees can be iden- 
tified by using depth first search [17 to identify connected 
components in the resulting tweet-tweet graph. To en- 
sure that the full tree is found and not just a part of 
it, we treat each link as undirected for the purposes of 
this identification. In this way we are able to extract the 
complete tree even if we happen to start on one of the 
leaves. For each tree the root is then found by locating 
the node with A:^^ = 0, and distances from the root are 
measured by rerunning the DPS algorithm starting from 
the root and respecting the direction of each edge. 

The underlying reply network can be extracted by pro- 
jecting the tweet trees to a user graph: User A is con- 
nected to user 5 by a directed outgoing edge if A replied 
to a tweet sent by B. Over time, any pair of users can ex- 
change multiple replies either in a single "conversation" 
(tree) or through multiple conversations. The number of 
messages sent from one user to another is used as the 
weight of the corresponding directed edge and is taken 
to signify the strength of the connection between the two 
users, with higher weights representing stronger connec- 
tions. 



B. Online conversations 

Each reply creates a connection between two tweets 
and their authors, so we can define a conversation as 
a branching process of consecutive replies, resulting in 
a tree of tweets. From our dataset we extracted and 
analyzed a forest of over 25 million trees. Trees vary 
broadly in size and shape, with most conversations re- 
maining small while a few grow to include thousands of 
tweets and hundreds of users, as shown in Figure [2] 

A directed user-user network can be built by projecting 
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FIG. 2: Tree characterization. A) Distribution of the number of tweets in a tree. B) Distribution of the number of shells. C) 
Distribution of the number of users. D) Tree size vs depth. The broad tailed nature of all of these quantities indicates the 
diversity of behaviors displayed by the users in our dataset. 



conversation trees to detail how users interact and estab- 
lish relationships among themselves. Bidirectional edges 
signify mutual interactions, with stronger weights imply- 
ing a more frequent or prolonged interaction between two 
individuals. 

All of our analysis will be performed on this user-user 
conversation network. We consider a user to have out 
degree kout if he or she replies to kout other users, re- 
gardless of the number of explicit followers or friends the 
given user has. By focusing on direct interactions we are 
able to eliminate the confounding effect of users that have 
tens or hundreds of thousands of followers with whom 
they have no contact and are able to focus on real person 
to person interactions [13]. 

III. DUNBAR'S NUMBER IN OUR DATA 

In the generated network each node corresponds to a 
single user. The out-degree of the nodes is the number of 
users the node replies to, while the in-degree corresponds 



to the number of different nodes it receives a reply from. 
When A follows B, A subscribes to receive all the up- 
dates published by B. A is then one of B's followers and 
B is one of A's friends. Previous studies have mostly 
focused on the network induced by this follower-friend 
relationship [15], p[8tl2Q] . In any study about stable so- 
cial relations in online media, as indicated by studies 
about Dunbar's number, it is important to discount oc- 
casional social interactions. For this reason we focus on 
stronger relationships in our study [13 , considering just 
active communication from one user to another by means 
of a genuine social interaction between them. In our net- 
work [21, 22 we introduce the weight ujij of each edge, 
defined as the number of times user i replies to user j as 
a direct measurement of the interaction strength between 
two users and stable relations will be those with a large 
weight. A simple way to measure this effect is to calculate 
the average weight of each interaction by a user as a func- 
tion of his total number of interactions. Users that have 
only recently joined Twitter will have few friends and 
very few interactions with them. As time goes by, stable 
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users will acquire more and more friends, but the number 
of replies that they send to other users will increase con- 
sistently only in stable social interactions. Eventually, a 
point is reached where the number of contacts surpasses 
the user's ability to keep in contact with them. 
This saturation process will necessarily lead to some rela- 
tionships being more valued than others. Each individual 
tries to optimize her resources by prioritizing these inter- 
actions. To quantify the strength of these interactions, 
we studied the quantity o;-^^^ , defined as the average so- 
cial strength of active initiate relationship: 



Uout 



(1) 



This quantity corresponds to the average weight per 
outgoing edge of each individual where T represents the 
time window for data aggregation. We measure this 
quantity in our data set as shown in Figure [3|^. The data 
shows that this quantity reaches a maximum between 100 
and 200 friends, in agreement with Dunbar's prediction 
(see figure 2A). This finding suggests that even though 
modern social networks help us to log all the people with 
whom we meet and interact, they are unable to overcome 
the biological and physical constraints that limit stable 
social relations. In Figure 2B, we plot , the number of 
reciprocated connections, as a function of the number 
of the in-degree. saturates between 200 and 300 even 
though the number of incoming connections continues to 
increase. This saturation indicates that after this point 
the system is in a new regime; new connections can be re- 
ciprocated, but at a much smaller rate than before. This 
can be accounted for by spurious exchanges we make with 
some contacts with whom we do not maintain an active 
relationship. 



IV. THE MODEL 

Let us consider a static network ^, characterized by a 
degree distribution P{k). Each user (node) i is connected 
to all its nearest neighbors js through two weighted di- 
rected edges, i ^ j and j ^ i so that: 



(2) 



Where k'^'^^ is the out degree, the number of out-going 
links, and kf^ is the in degree, the number of in-going 
links, of the user i. Each node uses its out links to 
send messages to its contacts and it will receive messages 
from its contacts through its in links. In this way is easy 
to distinguish between incoming and outgoing messages. 
Whenever a message is sent from node i to node j, the 
weight of the ii^j) edge, Wij is increased by one. The to- 
tal number of sent messages of each user is given by the 
sum over all of its outgoing edges. Users communicate 
with each other by replying to messages. The assump- 
tion of our model is that biological and time constraints 



are the keys ingredients in fixing the Dunbar's number. 
We model this considering that when user i receives a 
message it places it in an internal queue that allows up 
to qmax,i mcssagcs to be handled at each time step. In 
the presence of finite resources each agent has to make 
decisions on what are the most important messages to 
answer. We set the priority of each message to be pro- 
portional to the total degree of the sender j. For each 
user the we studied is the average number of interactions 
per connection uj^'^^{T) as defined in the Eq. Q. At each 
time step each agent goes through its queue and performs 
the following simple operations: 

• The agent replies to a random number St of mes- 
sages between and the number of messages qi 
present in the queue. The messages to be replied 
to are selected proportionally to the priority of the 
sending agent (its total degree). A message is then 
sent to j, the node we are replying to, and the cor- 
responding weight Uij is incremented by one. 

• Messages the agent has replied to are deleted from 
the queue and all incoming messages are added to 
the queue in a prioritized order until the number of 
messages reaches Qmax- Messages in excess of Qmax 
are discarded. 

The dynamic process is then repeated for a total num- 
ber of time steps T. In order to initialize the process 
and take into account the effect of endogenous random 
effects, each agent can broadcast a message to all of its 
contacts with some small probability p. One may think of 
this message as a common status change, or a TV appear- 
ance, news story, or any other information not necessarily 
authored by the sending agent. Since these messages are 
not specifically directed from one user to another, they do 
not contribute to the weight of the edges through which 
they fiow. We have studied this simple model by using an 
underlying network of N = 10^ nodes and different scale- 
free topologies. For each simulation T = 2 x 10^ time 
steps have been considered and the plots are made evalu- 
ating the medians among at least 10^ runs. In FigurejZjwe 
report the results of simulations in a directed heavy-tailed 
network with a power-law tail similar to those observed 
for the measured network [19] . The figures clearly show a 
behavior compatible with the empirical data. The peak 
that maximizes the information output per connection 
is linearly proportional to Qmaxi supporting the idea that 
the physical constraints entailed in the queue's maximum 
capacity along with the prioritization that gives impor- 
tance to popular senders are at the origin of the observed 
behavior. We have also performed an extensive sensitiv- 
ity analysis on the broadcasting probability p, the time 
scale T, and have investigated the effect of agent hetero- 
geneity by studying populations in where each agent's 
capacity Qmax^'^ is randomly distributed according to a 
Gaussian distribution centered around Qmax with stan- 
dard deviation a. 
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FIG. 3: A) Out-weight as a function of the out-degree. The average weight of each outward connection gradually increases 
until it reaches a maximum near 150 — 200 contacts, signaling that a maximum level of social activity has been reached. Above 
this point, an increase in the number of contacts can no longer be sustained with the same amount of dedication to each. The 
red line corresponds to the average out- weight, while the gray shaded area illustrates the 50% confidence interval. B) Number of 
reciprocated connections, p, as a function of kin- As the number of people demanding our attention increases, it will eventually 
saturate our ability to reply leading to the flat behavior displayed in the dashed region. 




queues start to get messages in and out. After a while 
we can aspect that the system reaches a dynamical equi- 
librium. In Figure ([sj-A) we show the behavior of our 
observable cj^^^ for different values of T, in particular we 
chose T = 10^, 1.5 x 10^^,2 x 10"^. The effect of time is 
clearly a shift on the y axis and a small change in the 
position of the peak. The first effect is due to the fact 
that the number of messages circulating in the systems 
increase linearly with T. The second effect is due to the 
reduction of fluctuations when more messages are sent. 
The peak becomes more clear and defined. 



FIG. 4: Result of running our model on a heterogeneous 
network made of N — 10^, nodes with degree distribution 
P{k) with 7 = —2.4 and a — 10. Different curves cor- 
respond to different queue size. The inset shows the linear 
dependence of the peak on the queue size q. Each curve is 
the median of 10^ to 2 x 10^ runs of T = 2 x 10^ time steps 



A. Effect of the time window T 

One of the parameters of our model is the time window 
T during which we study the dynamics. This parame- 
ter regulates the maximum number of messages that will 
circulate in the network. In the first time steps the first 
messages will start to being sent among users and the 



B. Effect of broadcast probability p 

The effect of the broadcast probability is different on 
respect to the effect of the time window T. First of all 
our observable cj^^^ is linearly proportional to T in all 
regimes of k^^^ this is not true for p. The effect of p is 
crucial for users with a small number of contacts. As the 
p increases they will receive more messages and their ac- 
tivity will increase too, this does not occur in the other 
limit. When the saturation takes place the uo^^^ becomes 
completely independent of p. As show in details in a 



mean- field approach (Section (IV D)) for values of /c^^^ 
small with respect to the queue size, uj^^^ scales linearly 
with p. Instead for a number of contacts much bigger 
than the queue size cj^^^ is independent of p. These con- 
siderations are validated by our simulations as shown in 
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Figure (jsj-B). We see a clear dependence on p for small 
values of k^'^^ instead the same behavior for bigger values 
of k'^''^ 



C. Effect of network's properties 

Inspired by several studies [13l [151 ES] we fix the base- 
line of our model using scale- free networks. It is im- 
portant then to study how differences in the network 
structure affect the results. In this section we consider 
the effect of the exponent 7. As show in Figure ([sj-C) 
we run our model on top of scale-free networks with 
7 = -2.2, -2.4, -2.6, -2.8. As clear from the plot for 
smaller values of 7 (bigger value in absolute value) gaps 
on k^'^* start to emerge. These are due to the network 
structure. The shape and position of the peak is the 
same for all the curves. The differences are evident just 
on the peak height that increase as 7 decreases. This is 
due the different redistribution of degrees and to the fact 
that with small 7 the selection effect is more and more 
important. So we can say that the result are robust on 
7- 



D. Single user: analytical approach 

In order to get a better understanding of the mecha- 
nisms we describe, we analyzed, in a mean-field approach, 
the behavior of a single user i. 

Let us focus on a user i characterized by degree ki and 
Qmax,i' k^^* = ki/2 are the out-going links that it uses 
to send messages to its ki/2 contacts, kf^ = ki/2 are 
the in-coming through which it receives messages from 
it contacts. We set as kj the priority of each neighbor 
that we extract for a distribution V{k). The rules of the 
model that we described in the previous sections are ap- 
plied for T time steps. The probability that a neighbor 
j will send a message to the user i is: 



Pji = P + 



ki 



<k> kj' 



(3) 



where p is the broadcast probability. We can evaluate the 
average number of messages that the user will receive at 
each time step t: 



ki 



1 



< A: > ^ kj 



(4) 



We extracted kj from the same distribution, the sum 
scales then linearly with the number of element: kl'^. We 
can write: 



ki 

Pji = Pit + ' 



2<k>' 



(5) 



where c is a constant fixed by the distribution. Since the 
priority of the user is proportional to its degree as well 



as the number of in-coming connections, the number of 
messages it get scale as the square of its degree. 
Two different regimes are easily found: ki <C qmax,i and 
vice versa. 

In the first case the user is not popular. The number 
of messages that the user will receive is small then. In 
principle it can reply to all of them at each time step. 
We can assume that in this regime its queue is never 
completely full. We will refer to Rt as the number of 
messages that the user reiceive at the time step t. After 
one time step the number of replies is: 



^1 — ^1^1, 



(6) 



where ^1 is a random number uniformly distributed be- 
tween and 1. The number of messages, 5*2 that the user 
send at the second time step is a random fraction of the 
messages present in its queue: 



S2 

For t = 3 we get: 



[i?i(i-a) + ^2]6- 



(7) 



S3 = {[i?l(l-a) + ^2](l-6) + ^3}e3 

= [i?l(l-ei)(l-6) + ^2(l-e2) + i?3]C3, (8) 

and so on. We can approximate these equations using 
the average number of received messages (R). For the 
general t it is possible to show that: 



t t-i 
j=i i=j 



{i?)et[2-6-i+0(e')] 

{R) [2it-itit.i + o{e)\- 



(9) 



The total number of messages sent is the numerator of 
our measure uj^^^ and the sum of all the St- 



t=T 

E 

t=0 



St 



T{R), 



(10) 



considering that each sum of product random numbers is 
order T. We can write then: 



put 



(T) = 



1 



k? 



2<k> 



(11) 



In this regime we get a linear increase with k^'^^ of the 
average number of replies per connections. As show in 
Figure (|6| this is confirmed in the simulations. 
The other regime is found for a number of contacts bigger 
than the queue size. In this case the user is very popular 
and at each time step it gets a lot of messages and is not 
able to handle all of it. In this limit the saturation process 
takes place and it will reply just to a small fraction of 
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FIG. 5: 



A) cj''''* as a function of k''''^ for qmax = 100, a 
present the medians over 500 runs B) uj^'^^ as a function of k^'^^ 



10, scale-free network with 7 = —2.4 and different values of T. We 
for Qmax — 100, (7 = 10, scalc-frcc network with 7 — —2.4, 
function of k""""* for qmax = 100, a = 10, 



T = 10 and different values of p. We present the medians over 500 runs C) u^'^ as 

T — 10^, p = 5 X 10~^, scale-free network with different values of 7. We present the median over 500 runs. 



,x 10" 




1000 



FIG. 6: Results for the single user and different values of a, 
the inter-user queue size variance. We fixed the average queue 
size at qmax,i = 50 and extracted the priorities of user neigh- 
bors from a power-law statistical distribution with exponent 
7 = —2.1. For each ki we run T = 500 time steps and present 



the medians among 10 runs 



the total number of messages prioritizing them. At each 
time step this number is a random variable uniformly 
distributed between and qmax,i- We have then: 



1 

J^out 



(12) 



^=0 



The ^ts are random variable uniformly distributed be- 
tween and 1. At each time step the number of replies is 
a random fraction of the queue size. For T large enough 
we get: 



T 

2^ont ^rnax,i- 



(13) 



In this regime then we get a different scaling behavior 
typical of saturation problems. As shown in Figure (IgI) 



these arguments are in perfect agreement with the nu- 
merical results. 

We have shown two different regimes. A linear increasing 
behavior and a decreasing one. In the between of these 
opposite cases we will find a maximum of the function. 
The position of these peak is in general function of the 
queue size. 



V. CONCLUSIONS 

Social networks have changed they way we use to com- 
municate. It is now easy to be connected with a huge 
number of other individuals. In this paper we show that 
social networks did not change human social capabili- 
ties. We analyze a large dataset of Twitter conversations 
collected across six months involving millions of individ- 
uals to test the theoretical cognitive limit on the number 
of stable social relationships known as Dunbar's num- 
ber. We found that even in the online world cognitive 
and biological constraints holds as predicted by Dunbar's 
theory limiting users social activities. We propose a sim- 
ple model for users' behavior that includes finite priority 
queuing and time resources that reproduces the observed 
social behavior. This simple model offers a basic explana- 
tion of a seemingly complex phenomena observed in the 
empirical patterns on Twitter data and offers support to 
Dunbar's hypothesis of a biological limit to the number 
of relationships. 
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